We present AI-SDC, an integrated suite of open source Python tools to facilitate Statistical Disclosure Control (SDC) of Machine Learning (ML) models trained on confidential data prior to public release. AI-SDC combines (i) a SafeModel package that extends commonly used ML models to provide ante-hoc SDC by assessing the vulnerability of disclosure posed by the training regime; and (ii) an Attacks package that provides post-hoc SDC by rigorously assessing the empirical disclosure risk of a model through a variety of simulated attacks after training. The AI-SDC code and documentation are available under an MIT license at https://github.com/AI-SDC/AI-SDC.
translated by 谷歌翻译
在自动驾驶符号识别等任务中,强大的分类至关重要,因为错误分类的弊端可能是严重的。对抗性攻击威胁着神经网络分类器的鲁棒性,导致它们始终如一,自信地误导了道路标志。一种这样的攻击,基于阴影的攻击,通过应用自然的阴影来输入图像引起误解,从而导致人类观察者看起来很自然,但对这些分类器感到困惑。当前针对此类攻击的防御能力采用简单的对抗训练程序,分别在GTSRB和LISA测试集上获得相当低的25 \%和40 \%的鲁棒性。在本文中,我们提出了一种健壮,快速且可推广的方法,旨在在道路标志识别的背景下防御阴影攻击,以增强具有二进制自适应阈值和边缘图的源图像。我们从经验上表明了它针对影子攻击的稳健性,并重新制定了该问题,以表明其相似性$ \ varepsilon $基于扰动的攻击。实验结果表明,我们的边缘防御能力达到78 \%的鲁棒性,同时在GTSRB测试集上保持98 \%的良性测试精度,这是我们阈值防御的类似结果。链接到我们的代码是在论文中。
translated by 谷歌翻译
有效地对远程依赖性建模是序列建模的重要目标。最近,使用结构化状态空间序列(S4)层的模型在许多远程任务上实现了最先进的性能。 S4层将线性状态空间模型(SSM)与深度学习技术结合在一起,并利用HIPPO框架进行在线功能近似以实现高性能。但是,该框架导致了架构约束和计算困难,使S4方法变得复杂,可以理解和实施。我们重新审视这样的想法,即遵循河马框架对于高性能是必要的。具体而言,我们替换了许多独立的单输入单输出(SISO)SSM的库S4层与一个多输入的多输出(MIMO)SSM一起使用,并具有降低的潜在尺寸。 MIMO系统的缩小潜在维度允许使用有效的并行扫描,从而简化了将S5层应用于序列到序列转换所需的计算。此外,我们将S5 SSM的状态矩阵初始化,其近似与S4 SSMS使用的河马级矩阵近似,并表明这是MIMO设置的有效初始化。 S5与S4在远程任务上的表现相匹配,包括在远程竞技场基准的套件中平均达到82.46%,而S4的80.48%和最佳的变压器变体的61.41%。
translated by 谷歌翻译
通过提供超出人为局限性的环境,机器人是空间探索的关键仪器。跳跃机器人概念是有吸引力的谈判复杂地形的解决方案。然而,在克服的工程挑战中,能够持续运行的跳跃机器人概念,机械故障模式的减少是最基本的。本研究提出开发跳跃机器人,重点是减少机制维护的最小致动。我们介绍了Sarrus式连杆的合成,以限制系统在不使用典型的同步齿轮的情况下对系统进行三种翻译程度。我们将目前的研究界定到垂直固体跳跃,以评估基本主驱动轴的性能。实验室示威者有助于转移理论概念和方法。实验室示威者进行了63%的动能转换效率的跳跃,理论最大为73%。令人满意的运行开辟了朝向太空勘探跳跃机器人平台的发展的设计优化和方向跳跃能力。
translated by 谷歌翻译
概率机器学习越来越越来越多地向医学,经济,政治和超越的关键决策促进。我们需要证据支持所产生的决定是充分创建的。为了帮助发展对这些决定的信任,我们开发了一个分类划分的分类划分,在分析中的信任可以分解:(1)在现实世界目标的翻译中对特定培训数据的目标,(2)在训练数据上翻译培训数据到一个具体的数学问题,(3)在使用算法来解决所述的数学问题,(4)在使用特定代码实现的选择算法。我们详细介绍了每一步的信任如何失败,并用两种案例研究说明我们的分类法:分析小额信贷和经济学家预测2020年2020年总统选举的疗效分析。最后,我们描述了各种各样的方法,可用于增加我们分类的每一步的信任。我们的分类学突出了关于信任的现有研究工作倾向于集中注意力的步骤,以及建立信任的步骤尤其具有挑战性。
translated by 谷歌翻译
睡眠研究必须携带与睡眠损失相关的表型和有助于精神病理学的露出机制。最常见的是,调查人员手动将多色网络分类为警惕状态,这是耗时的,需要广泛的培训,并且容易出现帧间间变异性。虽然许多作品已经基于多个EEG通道成功开发了自动化状态分类器,但是我们的目标是生产一种自动化和开放式分类器,可以基于来自啮齿动物的单个皮质脑电图(EEG)来可靠地预测警惕状态,以最大限度地减少伴随的缺点通过电线束缚小动物到计算机程序。大约427小时的连续监测的脑电图,电灰度(EMG)和活性由总数据的571小时的域专家标记。在这里,我们评估各种机器学习技术对分类10-秒钟时期的各种机器学习技术的性能,进入三个离散类中的一种:矛盾,慢波或唤醒。我们的调查包括决策树,随机森林,天真贝叶斯分类器,Logistic回归分类器和人工神经网络。这些方法达到了约74%至约96%的精度。最值得注意的是,随机森林和巢穴分别实现了95.78%和93.31%的显着准确性。在这里,我们已经示出了各种机器学习分类器的潜力,以基于单个EEG读数和单一EMG读数自动,准确地和可靠地对警惕状态进行自动。
translated by 谷歌翻译
一些研究人员推测智能强化学习(RL)代理商将被激励寻求资源和追求目标的权力。其他研究人员指出,RL代理商不需要具有人类的寻求技能本能。为了澄清这一讨论,我们开展了最优政策统计趋势的第一个正式理论。在马尔可夫决策过程的背景下,我们证明某些环境对称是足以实现对环境寻求权力的最佳政策。这些对称存在于许多环境中,其中代理可以关闭或销毁。我们证明,在这些环境中,大多数奖励功能使其通过保持一系列可用的选项来寻求电力,并在最大限度地提高平均奖励时,通过导航到更大的潜在终端状态。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.
translated by 谷歌翻译
We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.
translated by 谷歌翻译